CLAIMS 



1 1 . (original) A method comprising the steps of: 

2 (a) converting a plurality of input audio signals into a combined audio signal and a plurality 

3 of auditory scene parameters; and 

4 (b) embedding the auditory scene parameters into the combined audio signal to generate an 

5 embedded audio signal, such that: 

6 a first receiver that is aware of the existence of the embedded auditory scene parameters can 

7 extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory 

8 scene parameters to synthesize an auditory scene; and 

9 a second receiver that is unaware of the existence of the embedded auditory scene parameters can 

1 0 process the embedded audio signal to generate an output audio signal, where the embedded auditory 

1 1 scene parameters are transparent to the second receiver. 

1 2. (original) The invention of claim 1, wherein the plurality of auditory scene parameters 

2 comprise two or more different sets of one or more auditory scene parameters, wherein each set of 

3 auditory scene parameters corresponds to a different frequency band in the combined audio signal such 

4 that the first receiver synthesizes the auditory scene by (a) dividing an input audio signal into a plurality 

5 of different frequency bands; and (b) applying the two or more different sets of one or more auditory 

6 scene parameters to two or more of the different frequency bands in the input audio signal to generate 

7 two or more synthesized audio signals of the auditory scene, wherein for each of the two or more 

8 different frequency bands, the corresponding set of one or more auditory scene parameters is applied to 

9 the input audio signal as if the input audio signal corresponded to a single audio source in the auditory 
10 scene. 

1 3. (original) The invention of claim 2, wherein each set of one or more auditory scene 

2 parameters corresponds to a different audio source in the auditory scene. 

1 4. (original) The invention of claim 2, wherein, for at least one of the sets of one or more 

2 auditory scene parameters, at least one of the auditory scene parameters corresponds to a combination of 

3 two or more different audio sources in the auditory scene that takes into account relative dominance of 

4 the two or more different audio sources in the auditory scene. 

1 5. (original) The invention of claim 2, wherein the two or more synthesized audio signals 

2 comprise left and right audio signals of a binaural signal corresponding to the auditory scene. 

1 6. (original) The invention of claim 2, wherein the two or more synthesized audio signal 

2 comprise three or more signals of a multi-channel audio signal corresponding to the auditory scene. 

1 7. (original) The invention of claim 1, wherein the combined audio signal corresponds to a 

2 combination of two or more different mono source signals, wherein the two or more different frequency 

3 bands are selected by comparing magnitudes of the two or more different mono source signals, wherein, 

4 for each of the two or more different frequency bands, one of the mono source signals dominates the one 

5 or more other mono source signals. 

1 8. (original) The invention of claim 1, wherein the combined audio signal corresponds to a 

2 combination of left and right audio signals of a binaural signal, wherein each different set of one or more 

3 auditory scene parameters is generated by comparing the left and right audio signals in a corresponding 

4 frequency band. 
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1 9. (original) The invention of claim 1, wherein the auditory scene parameters comprise one 

2 or more of an interaural level difference, an interaural time delay, and a head-related transfer function. 

1 10. (original) The invention of claim 1, wherein step (b) comprises the step of applying a 

2 layered coding technique in which stronger error protection is provided to the combined audio signal than 

3 to the auditory scene parameters when generating the embedded audio signal, such that errors due to 

4 transmission over a lossy channel will tend to affect the auditory scene parameters before affecting the 

5 combined audio signal to improve the probability of the first receiver to process at least the combined 

6 audio signal. 

1 11. (original) The invention of claim 1 , wherein step (b) comprises the step of applying a 

2 multi-descriptive coding technique in which the auditory scene parameters and the combined audio signal 

3 are both divided into two or more streams, wherein each stream divided from the auditory scene 

4 parameters is embedded into a corresponding stream divided from the combined audio signal to form a 

5 stream of the embedded audio signal, such that the two or more streams of the embedded audio signal 

6 may be transmitted over two or more different channels to the first receiver, such that the first receiver is 

7 able to synthesize the auditory scene using extracted auditory scene parameters having relatively coarse 

8 resolution when errors result from transmission of one or more of the streams of the embedded audio 

9 signal over one or more lossy channels. 

1 12. (original) A machine-readable medium, having encoded thereon program code, wherein, 

2 when the program code is executed by a machine, the machine implements a method, comprising the 

3 steps of: 

4 (a) converting a plurality of input audio signals into a combined audio signal and a plurality 

5 of auditory scene parameters; and 

6 (b) embedding the auditory scene parameters into the combined audio signal to generate an 

7 embedded audio signal, such that: 

8 a first receiver that is aware of the existence of the embedded auditory scene parameters can 

9 extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory 

1 0 scene parameters to synthesize an auditory scene; and 

11 a second receiver that is unaware of the existence of the embedded auditory scene parameters can 

12 process the embedded audio signal to generate an output audio signal, where the embedded auditory 

1 3 scene parameters are transparent to the second receiver. 

1 13. (original) An apparatus comprising: 

2 (a) an encoder configured to convert a plurality of input audio signals into a combined audio 

3 signal and a plurality of auditory scene parameters; and 

4 (b) a merging module configure to embed the auditory scene parameters into the combined 

5 audio signal to generate an embedded audio signal, such that: 

6 a first receiver that is aware of the existence of the embedded auditory scene parameters can 

7 extract the auditory scene parameters from the embedded audio signal and apply the extracted auditory 

8 scene parameters to synthesize an auditory scene; and 

9 a second receiver that is unaware of the existence of the embedded auditory scene parameters can 

1 0 process the embedded audio signal to generate an output audio signal, where the embedded auditory 

1 1 scene parameters are transparent to the second receiver. 

1 14. (original) A method for synthesizing an auditory scene, comprising the steps of: 

2 (a) receiving an embedded audio signal comprising a combined audio signal embedded with 

3 a plurality of auditory scene parameters, wherein a receiver that is unaware of the existence of the 
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4 embedded auditory scene parameters can process the embedded audio signal to generate an output audio 

5 signal, where the embedded auditory scene parameters are transparent to the receiver; 

6 (b) extracting the auditory scene parameters from the embedded audio signal; and 

7 (c) applying the extracted auditory scene parameters to the combined audio signal to 

8 synthesize an auditory scene. 

9 15. (original) The invention of claim 14, wherein the plurality of auditory scene parameters 

1 0 comprise two or more different sets of one or more auditory scene parameters, wherein each set of 

1 1 auditory scene parameters corresponds to a different frequency band in the combined audio signal such 

1 2 that the auditory scene is synthesized by (1) dividing the combined audio signal into a plurality of 

13 different frequency bands; and (2) applying the two or more different sets of one or more auditory scene 

14 parameters to two or more of the different frequency bands in the combined audio signal to generate two 

15 or more synthesized audio signals of the auditory scene, wherein for each of the two or more different 

1 6 frequency bands, the corresponding set of one or more auditory scene parameters is applied to the 

1 7 combined audio signal as if the combined audio signal corresponded to a single audio source in the 

18 auditory scene. 

1 16. (original) The invention of claim 15, wherein each set of one or more auditory scene 

2 parameters corresponds to a different audio source in the auditory scene. 

1 17. (original) The invention of claim 15, wherein, for at least one of the sets of one or more 

2 auditory scene parameters, at least one of the auditory scene parameters corresponds to a combination of 

3 two or more different audio sources in the auditory scene that takes into account relative dominance of 

4 the two or more different audio sources in the auditory scene. 

1 18. (original) The invention of claim 15, wherein the two or more synthesized audio signals 

2 comprise left and right audio signals of a binaural signal corresponding to the auditory scene. 

1 19. (original) The invention of claim 15, wherein the two or more synthesized audio signal 

2 comprise three or more signals of a multi-channel audio signal corresponding to the auditory scene. 

1 20. (original) The invention of claim 14, wherein the combined audio signal corresponds to 

2 a combination of two or more different mono source signals, wherein the two or more different frequency 

3 bands are selected by comparing magnitudes of the two or more different mono source signals, wherein, 

4 for each of the two or more different frequency bands, one of the mono source signals dominates the one 

5 or more other mono source signals. 

1 21 . (original) The invention of claim 14, wherein the combined audio signal corresponds to 

2 a combination of left and right audio signals of a binaural signal, wherein each different set of one or 

3 more auditory scene parameters is generated by comparing the left and right audio signals in a 

4 corresponding frequency band. 

1 22. (original) The invention of claim 14, wherein the auditory scene parameters comprise 

2 one or more of an interaural level difference, an interaural time delay, and a head-related transfer 

3 function. 

1 23. (original) The invention of claim 14, wherein the embedded audio signal was generated 

2 by applying a layered coding technique in which stronger error protection was provided to the combined 

3 audio signal than to the auditory scene parameters, such that errors due to transmission over a lossy 
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4 channel will tend to affect the auditory scene parameters before affecting the combined audio signal to 

5 improve the probability of a receiver to process at least the combined audio signal. 

1 24. (original) The invention of claim 14, wherein the embedded audio signal was generated 

2 by applying a multi-descriptive coding technique in which the auditory scene parameters and the 

3 combined audio signal were both divided into two or more streams, wherein each stream divided from 

4 the auditory scene parameters was embedded into a corresponding stream divided from the combined 

5 audio signal to form a stream of the embedded audio signal, such that the two or more streams of the 

6 embedded audio signal may be transmitted over two or more different channels to a receiver, such that 

7 the receiver is able to synthesize the auditory scene using extracted auditory scene parameters having 

8 relatively coarse resolution when errors result from transmission of one or more of the streams of the 

9 embedded audio signal over one or more lossy channels. 

1 25. (original) A machine-readable medium, having encoded thereon program code, wherein, 

2 when the program code is executed by a machine, the machine implements a method for synthesizing an 

3 auditory scene, comprising the steps of: 

4 (a) receiving an embedded audio signal comprising a combined audio signal embedded with 

5 a plurality of auditory scene parameters, wherein a receiver that is unaware of the existence of the 

6 embedded auditory scene parameters can process the embedded audio signal to generate an output audio 

7 signal, where the embedded auditory scene parameters are transparent to the receiver; 

8 (b) extracting the auditory scene parameters from the embedded audio signal; and 

9 (c) applying the extracted auditory scene parameters to the combined audio signal to 
1 0 synthesize an auditory scene. 

1 26. (original) An apparatus for synthesizing an auditory scene, comprising: 

2 (a) a dividing module configured to (1) receive an embedded audio signal comprising a 

3 combined audio signal embedded with a plurality of auditory scene parameters, wherein a receiver that is 

4 unaware of the existence of the embedded auditory scene parameters can process the embedded audio 

5 signal to generate an output audio signal, where the embedded auditory scene parameters are transparent 

6 to the receiver and (2) extract the auditory scene parameters from the embedded audio signal; and 

7 (b) a decoder configure to apply the extracted auditory scene parameters to the combined 

8 audio signal to synthesize an auditory scene. 

1 27. (new) A method for encoding C input audio channels to generate E transmitted audio 

2 channels, the method comprising: 

3 providing two or more of the C input channels in a frequency domain; 

4 generating one or more cue codes for each of one or more different frequency bands in the two or 

5 more input channels in the frequency domain; and 

6 downmixing the C input channels to generate the E transmitted channels, where OEz 1 . 

1 28. (new) The invention of claim 27, further comprising formatting the E transmitted 

2 channels and the one or more cue codes into a transmission format such that: 

3 the format enables a first audio decoder having no knowledge of the existence of the one or more 

4 cue codes to generate E playback audio channels based on the E transmitted channels and independent of 

5 the one or more cue codes; and 

6 the format enables a second audio decoder having knowledge of the existence of the one or more 

7 cue codes to generate more than E playback audio channels based on the E transmitted channels and the 

8 one or more cue codes. 
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1 29. (new) The invention of claim 28, wherein the format enables the second audio decoder 

2 to generate C playback audio channels based on the E transmitted channels and the one or more cue 

3 codes. 

1 30. (new) The invention of claim 27, wherein E=l. 

1 31. (new) The invention of claim 27, wherein E>\. 

1 32. (new) The invention of claim 27, wherein each of the E transmitted channels is based on 

2 two or more of the C input channels. 

1 33. (new) The invention of claim 27, wherein the one or more cue codes comprise one of 

2 more of inter-channel level difference (ICLD) data and inter-channel time difference (ICTD) data. 

1 34. (new) The invention of claim 33, wherein the one or more cue codes comprise ICLD 

2 data and ICTD data. 

1 35. (new) The invention of claim 27, wherein the downmixing comprises, for each of one or 

2 more different frequency bands, downmixing the two or more input channels in the frequency domain 

3 into one or more downmixed channels in the frequency domain. 

1 36. (new) The invention of claim 35, wherein the downmixing further comprises converting 

2 the one or more downmixed channels from the frequency domain into one or more of the transmitted 

3 channels in the time domain. 

1 37. (new) An audio coder for encoding C input audio channels to generate E transmitted 

2 audio channels, the audio coder comprising: 

3 means for providing two or more of the C input channels in a frequency domain; 

4 means for generating one or more cue codes for each of one or more different frequency bands in 

5 the two or more input channels in the frequency domain; and 

6 means for downmixing the C input channels to generate the E transmitted channels, where 

7 OE*\. 

1 38. (new) Apparatus for encoding C input audio channels to generate E transmitted audio 

2 channels, the apparatus comprising: 

3 two or more filter banks adapted to convert two or more of the C input channels from a time 

4 domain into a frequency domain; 

5 a code estimator adapted to generate one or more cue codes for each of one or more different 

6 frequency bands in the two or more converted input channels; and 

7 a downmixer adapted to downmix the C input channels to generate the E transmitted channels, 

8 where OE>\. 

1 39. (new) The invention of claim 38, wherein the apparatus is adapted to format the E 

2 transmitted channels and the one or more cue codes into a transmission format such that: 

3 the format enables a first audio decoder having no knowledge of the existence of the one or more 

4 cue codes to generate E playback audio channels based on the E transmitted channels and independent of 

5 the one or more cue codes; and 

6 the format enables a second audio decoder having knowledge of the existence of the one or more 

7 cue codes to generate more than E playback audio channels based on the E transmitted channels and the 

8 one or more cue codes. 
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1 40. (new) The invention of claim 39, wherein the format enables the second audio decoder 

2 to generate C playback audio channels based on the E transmitted channels and the one or more cue 

3 codes. 

1 41. (new) The invention of claim 38, wherein E=l. 

1 42. (new) The invention of claim 38, wherein E>\. 

1 43. (new) The invention of claim 38, wherein each of the E transmitted channels is based on 

2 two or more of the C input channels. 

1 44. (new) The invention of claim 38, wherein the one or more cue codes comprise one of 

2 more of ICLD data and ICTD data. 

1 .45. (new) The invention of claim 44, wherein the one or more cue codes comprise ICLD 

2 data and ICTD data. 

1 46. (new) The invention of claim 38, wherein the downmixer is adapted, for each of one or 

2 more different frequency bands, to downmix the two or more converted input channels into one or more 

3 downmixed channels in the frequency domain. 

1 47. (new) The invention of claim 46, further comprising one or more inverse filter banks 

2 adapted to convert the one or more downmixed channels from the frequency domain into one or more the 

3 transmitted channels in the time domain. 

1 48. (new) The invention of claim 38, wherein: 

2 the apparatus is a system selected from the group consisting of a digital video recorder, a digital 

3 audio recorder, a computer, a satellite transmitter, a cable transmitter, a terrestrial broadcast transmitter, 

4 and an entertainment system; and 

5 the system comprises the two or more filter banks, the code estimator, and the downmixer. 

1 49. (new) An encoded audio bitstream generated by encoding C input audio channels to 

2 generate E transmitted audio channels, wherein: 

3 two or more of C input channels are provided in a frequency domain; 

4 one or more cue codes are generated for each of one or more different frequency bands in the two 

5 or more input channels in the frequency domain; 

6 the C input channels are downmixed to generate E transmitted channels, where OE± 1 ; and 

7 the E transmitted channels and the one or more cue codes are encoded into the encoded audio 

8 bitstream. 

1 50. (new) The invention of claim 49, the encoded audio bitstream has a transmission format 

2 such that: 

3 the format enables a first audio decoder having no knowledge of the existence of the one or more 

4 cue codes to generate E playback audio channels based on the E transmitted channels and independent of 

5 the one or more cue codes; and 

6 the format enables a second audio decoder having knowledge of the existence of the one or more 

7 cue codes to generate more than E playback audio channels based on the E transmitted channels and the 

8 one or more cue codes. 
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1 51. (new) The invention of claim 50, wherein the format enables the second audio decoder 

2 to generate C playback audio channels based on the E transmitted channels and the one or more cue 

3 codes. 

1 52. (new) An encoded audio bitstream comprising E transmitted channels and one or more 

2 cue codes, wherein: 

3 the one or more cue codes are generated by: 

4 providing two or more of C input audio channels in a frequency domain; and 

5 generating one or more cue codes for each of one or more different frequency bands in 

6 the two or more input channels in the frequency domain; and 

7 the E transmitted channels are generated by downmixing the C input channels, where OE> 1 . 

1 53. (new) The invention of claim 52, the encoded audio bitstream has a transmission format 

2 such that: 

3 the format enables a first audio decoder having no knowledge of the existence of the one or more 

4 cue codes to generate E playback audio channels based on the E transmitted channels and independent of 

5 the one or more cue codes; and 

6 the format enables a second audio decoder having knowledge of the existence of the one or more 

7 cue codes to generate more than E playback audio channels based on the E transmitted channels and the 

8 one or more cue codes. 

1 54. (new) The invention of claim 53, wherein the format enables the second audio decoder 

2 to generate C playback audio channels based on the E transmitted channels and the one or more cue 

3 codes. 

1 55. (new) A method for decoding E transmitted audio channels to generate C playback 

2 audio channels, the method comprising: 

3 upmixing, for each of one or more different frequency bands, one or more of the E transmitted 

4 channels in a frequency domain to generate two or more of the C playback channels in the frequency 

5 domain, where OE> 1 ; 

6 applying one or more cue codes to each of the one or more different frequency bands in the two 

7 or more playback channels in the frequency domain to generate two or more modified channels; and 

8 converting the two or more modified channels from the frequency domain into a time domain. 

1 56. (new) The invention of claim 55, further comprising, prior to upmixing, converting the 

2 one or more of the E transmitted channels from the time domain to the frequency domain. 

1 57. (new) The invention of claim 55, wherein E=\ . 

1 58. (new) The invention of claim 55, wherein E>\. 

1 59. (new) The invention of claim 55, wherein each of the C playback channels is based on at 

2 least one of the E transmitted channels and at least one cue code. 

1 60. (new) The invention of claim 55, wherein the one or more cue codes comprise one of 

2 more of ICLD data and ICTD data. 

1 61 . (new) The invention of claim 60, wherein the one or more cue codes comprise ICLD 

2 data and ICTD data. 
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1 62. (new) The invention of claim 55, wherein the upmixing comprises, for each of one or 

2 more different frequency bands, upmixing at least two of the E transmitted channels into at least one 

3 playback channel in the frequency domain. 

1 63. (new) An audio decoder for decoding E transmitted audio channels to generate C 

2 playback audio channels, the audio decoder comprising: 

3 means for upmixing, for each of one or more different frequency bands, one or more of the E 

4 transmitted channels in a frequency domain to generate two or more of the C playback channels in the 

5 frequency domain, where OEz 1 ; 

6 means for applying one or more cue codes to each of the one or more different frequency bands 

7 in the two or more playback channels in the frequency domain to generate two or more modified 

8 channels; and 

9 means for converting the two or more modified channels from the frequency domain into a time 
1 0 domain. 

1 64. (new) An apparatus for decoding E transmitted audio channels to generate C playback 

2 audio channels, the apparatus comprising: 

3 an upmixer adapted, for each of one or more different frequency bands, to upmix one or more of 

4 the E transmitted channels in a frequency domain to generate two or more of the C playback channels in 

5 the frequency domain, where OEz 1 ; 

6 a synthesizer adapted to apply one or more cue codes to each of the one or more different 

7 frequency bands in the two or more playback channels in the frequency domain to generate two or more 

8 modified channels; and 

9 one or more inverse filter banks adapted to convert the two or more modified channels from the 
1 0 frequency domain into a time domain. 

1 65. (new) The invention of claim 64, further comprising one or more filter banks adapted to 

2 convert, prior to the upmixing, the one or more of the E transmitted channels from the time domain to the 

3 frequency domain. 

1 66. (new) The invention of claim 64, wherein E- 1 . 

1 67. (new) The invention of claim 64, wherein E>\ . 

1 68. (new) The invention of claim 64, wherein each of the C playback channels is based on at 

2 least one of the E input channels and at least one cue code. 

1 69. (new) The invention of claim 64, wherein the one or more cue codes comprise one of 

2 more of ICLD data and ICTD data. 

1 70. (new) The invention of claim 69, wherein the one or more cue codes comprise ICLD 

2 data and ICTD data. 

1 71. (new) The invention of claim 64, wherein the upmixer is adapted, for each of one or 

2 more different frequency bands, to upmix at least two of the E transmitted channels into at least one 

3 playback channel in the frequency domain. 

1 72. (new) The invention of claim 64, wherein: 
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2 the apparatus is a system selected from the group consisting of a digital video player, a digital 

3 audio player, a computer, a satellite receiver, a cable receiver, a terrestrial broadcast receiver, and an 

4 entertainment system; and 

5 the system comprises the upmixer, the synthesizer, and the one or more inverse filter banks. 
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